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Abstract — Head pose estimation is an important pre-processing 
step in many pattern recognition and computer vision systems 
such as face recognition. Since the performance of the face 
recognition systems is greatly affected by the pose of the face, 
how to estimate the accurate pose of the face in the face image 
is still a challenging problem. In this paper, we present a novel 
method for head pose estimation. To enhance the efficiency of 
the estimation we first use contourlet transform for feature 
extraction. Contourlet transform is a multi-resolution, multi- 
direction transform. Finally, in order to reduce the feature 
space dimension and obtain appropriate features, we use LDA 
(Linear Discriminant Analysis) and PCA (Principal Component 
Analysis) to remove inefficient features. Then, we apply k- 
nearest neighborhood (k-NN) and minimum distance classifiers 
to classify the pose of head. We use the public available FERET 
database to evaluate the performance of the proposed method. 
Simulation results indicate the efficiency of the proposed method 
in comparison with previous method. 

Index Terms — Contourlet transform, Head pose estimation, k- 
NN, LDA, PCA. 

I. Introduction 

Head pose estimation is the process of extracting the ori- 
entation of a face or head from an image containing the face. 
Pose estimation is always used by human on relations to show 
a heightened sense of awareness and understanding by non- 
verbal communications when interacting with other people. 
There has been a significant improvement in face recognition 
over the last two decades. However, robust and accurate face 
recognition is still a classic problem because of variations in 
the pose of the face. Therefore, pose invariant face perception 
has been an active research topic for several years. 

Several methods were introduced in the literature in the late 
1990's and it continues up to now 1 1 ]. The proposed methods 
had advantages and also many disadvantages, and to overcome 
the disadvantages and enhance the advantages, many studies 
have been done in this field. The introduced methods can be 
classified into four categories as: 1) Template matching (2) 
methods which use the nearest neighbor classification to find 
the most similar view of a new head pose, 2) Appearance- 
based methods 1 3 ] that apply pattern classification or nonlinear 
regression tools, develop a functional mapping from the image 
or feature data to a head pose measurement, 3) Geometric 
methods (4) that determine poses by using the relative config- 
uration of facial landmarks (such as eyes, mouth, nose tip), 
4) Dimensionality reduction methods |5 ]-[7] which seek a 
low-dimensional continuous manifold constrained by the pose 



variations, and then new images can be embedded into these 
manifolds and used for template matching or regression. 

In this paper, we develop a new method to estimate the pose 
of the head in the face image. In our proposed method, we 
use contourlet transform to extract the features from the image. 
Moreover, to reduce the dimension of the feature space and 
also, to increase the separation we utilize Principal Component 
Analysis (PCA) and Linear Discriminant Analysis (LDA). In 
the final step, we apply the extracted features to the classifiers 
such as k-nearest neighborhood (k-NN) and minimum distance 
to classify them. 

The rest of the paper is organized as follows: Section [II] 
introduces the contourlet transform. Section III explains the 
proposed method. Simulation results and comparisons with 



other pose estimator systems are presented in section IV 
Finally, section [V] concludes the paper. 

II. Contourlet Transform 

Minh et al. (8) developed Contourlet transform based on 
an efficient two-dimensional (2-D) multiscale and directional 
filter bank that can deal effectively with images having smooth 
contours. Contourlets not only possess the main features of 
wavelets (namely multiscale and time-frequency localization), 
but also offer a high degree of directionality and anisotropy. 
The main difference between the contourlets and the other 
multiscale directional systems is that the contourlet transform 
allows for different and flexible number of directions at each 
scale, while achieving nearly critical sampling rate. 

In contourlet transform to achieve a sparse series for im- 
ages with smooth contours and having useful computations, 
the iterated filter banks are utilized. Contourlet transform is 
composed of Laplacian Pyramid (LP) (9), flO) and Directional 
Filter Bank (DFB) fTTJ. The first stage in this transform is 
to capture the points discontinuities by Laplacian Pyramid, 
and then followed by a directional filter bank to relate point 
discontinuities into linear structures. At each level, the LP 
decomposition generates a downsampled lowpass version of 
the original image, and then a bandpass image is resulted from 
the difference between the original image and the prediction 
one. Subsequently, the bandpass signal is filtered in different 
directions. The singular points in the same direction are 
synthesized to a coefficient by DFB. 



As shown in Fig. 1(a) DFB divides the highpass image into 
directional subbands in the frequency domain. Fig. 1 1(a) shows 
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Fig. 1. Spectrum partitioning, (a) decomposition scheme of the contourlet 
transform, (b) 8-subband partition. 



III. Proposed Head Pose Estimation Method 

Fig. [2] presents the diagram that shows the sequence we 
used in our proposed head pose estimation algorithm. In 
order to compute contourlet transform of head image, we 
use the contourlet transform toolbox provided in fl2) . In our 
method, at first, color face image is converted to gray-scale 
image. To do this, RGB image is converted to gray- scale by 
averaging the r, g, and b components of the color image. This 
method eliminates the hue and saturation information while 
retaining the luminance. We use pyramidal directional filter 
bank (PDFB) decomposition; '9-7' filters for the multiscale 
decomposition stage and 'PKVA' filters for the directional 
decomposition stage (13), (14). There will be two levels 
of pyramidal decomposition and the numbers of directional 
decomposition at each pyramidal level are set to and 1 . Using 
this decomposition, we obtain six sub-bands. These sub-bands 
represent our feature space. Fig. [4] shows a typical image and 
the sub-bands obtained from the contourlet transform. 
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a multiresolution and directional decomposition of contourlet 
filter bank which is composed of an LP and a DFB. To capture 
the directional information, bandpass images from the LP are 
fed into a DFB. Fig. [T(b)| shows the DFB with level / = 3 
where there are 2 3 = 8 real wedge-shaped frequency bands. 
Subbands 0-3 correspond to the mostly horizontal directions, 
while subbands 4-7 correspond to the mostly vertical direc- 
tions. 

The DFB divides the highpass image into 2 l wedge-shaped 
frequency bands after / level tree- structure segmentation. Since 
the DFB was designed to capture the high frequency of the 
input image, the low frequency content is poorly handled. In 
fact, the DFB alone does not provide a sparse representation 
for images, because with the frequency partition shown in Fig. 
1(b) I low frequency would leak into several directional sub- 



bands. This fact provides another reason to combine the DFB 
with a multiresolution decomposition, where low frequencies 
of the input image are removed before applying the DFB. 



Let Ao(n) be the input image as shown in Fig. |l(a)| The 
output after the LP decomposition is a lowpass image Ai(ri) 
and / bandpass images Bi(n), i = 1,2,...,/ . That means, 
the i-th level of LP decomposes the image A z _i(n) into a 
coarser image A^ri) and a highpass image Bi(n). Then each 
of highpass images Bi(n) are further decomposed by a //-level 
DFB into 2 li highpass directional images C*, j = 0, 1, ...,2 //_1 . 
The combined result is a double iterated filter bank structure, 
named contourlet filter bank, that is resulted in contourlet 
transform. 
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Fig. 2. Block diagram of the proposed head pose estimation algorithm 



All the extracted features are not appropriate for our work; 
consequently, the improper features should be removed from 
the feature space. There are several approaches for feature 
space dimension reduction, such as PCA and LDA. If we 
use only LDA, sometimes scatter matrix in LDA becomes 
singular and this degrades the performance of system. Hence, 
we use PCA before LDA to overcome this problem. PCA is 
unsupervised approach which is used only for whitening. LDA 
is supervised algorithm, thus it is good approach for feature 
space dimension reduction. More details are presented in the 
experimental results section. 

After obtaining the best feature vector, the final step is 
classifying the features. In this work, we choose two different 
classifiers: minimum distance and k-nearest neighborhood (k- 
NN). More details are included in the next section. 
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Fig. 3. Different head poses with label in FERET database 

IV. Simulation Results and Discussion 

A. Database 

We need the database that contains face images with dif- 
ferent rotations. So, we choose FERET database (15). In 
this database, lighting conditions, face size and descent of 
people vary too much. The images have different indices. From 
different indices, we select face rotations represented by pi, 
hi, ql, fa, fb, qr, hr, and pr indices, which stand for profile 
left (head turned about 90 degree left), half left (head turned 
about 67.5 degrees left), quarter left (head turned about 22.5 
degrees left), regular frontal image, alternative frontal image, 
taken shortly after the corresponding fa image, quarter right 
(head turned about 22.5 degrees right), half right (head turned 
about 67.5 degrees right), and profile right (head turned about 
90 degree right), respectively. All images have size of 512x768 
pixels and are in PPM (Portable Pixel Map) format. Besides, 
the selected images are with and without glasses. Fig. [3] shows 
some images from FERET database. 

We utilized MATLAB® software to evaluate the perfor- 
mance of our method. At first, we cropped the face area 
from input image manually. Then, all images are resized to 
120 x 90 pixels. The fa and fb indices are considered as one 
class. So, we have 7 pose classes. From each pose of database 
we randomly select 150 images (totally 1050 images). From 
each pose, we choose 10 images as train images and the other 
remained images are considered as test images. 

B. Feature Classification 

Among the 6 sub-bands obtained from contourlet transform 
as shown in Fig. [4j we use those sub-bands that labeled as 
b, c, e, and / and are 30 x 23 pixels. Also, sub-bands d and 
g are 60 x 90 pixels. We consider the value of each pixel 
as one feature. Therefore, the dimension of feature space is 
30x23x4 which is equal to 2760. This dimensionality is high 
and classifying such large number of features requires high 
complexity. Also, all of these features are not proper and many 
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Fig. 4. (a) a typical gray-scale image, (b)-(g) sub-bands obtained after 
contourlet transform. 



of them are redundant. As mentioned previously, in order to 
remove the redundancy from feature space, PCA and LDA are 
used. For this purpose, the normalized cumulative summation 
of eigenvalue corresponding to ith feature is calculated as 
follows: 



NCsEv(i) = 



r n =iEv(n) 



i= 1,...,M 



(1) 



where Ev(x) is the eigenvalue of the nth feature, and M is the 
total number of features. NCsEv is shown Fig. [5] According to 
NCsEv(i) of features, we keep 3 features and remove others. 
In order to have a better representation, we only show NCsEv 
for features 1 to 31. For features from 32 to 2760, NCsEv is 
equal to 1. The scatter plots of the features obtained by PCA 
and LDA are shown in Fig. [6] Scatter plot shows the values 
of features for different classes. It is clear that the proposed 
method has good ability in separating different pose classes. 

In Table [I] and Table [nj the confusion matrices of proposed 
method after applying k-NN and minimum distance classifiers 
are presented, respectively. It is clear that proposed method 
can classify different poses truly (i.e.: 100% classification 
accuracy). 

Islam et al. |T6| used bilateral filtering and wavelet trans- 
form in pre-processing step. They used isometric projection 
based subspace learning for extraction of discriminant feature 
vectors. They chose 5000 images from FERET database, but 
used five different poses, as pi, ql, fa, qr, and ql face represen- 
tations. PCA, LDA, and isometric projection (IsoP) are used 
separately for dimensionality reduction. For classification of 
feature vectors, they used the k-NN classifier. They achieved 
90.338%, 95.22%, and 97.07% recognition rates for PCA, 
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Fig. 5. Normalized cumulative summation of eigenvalues versus number of 
features. 
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TABLE I 

Confusion matrix for k-NN classifier. 
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TABLE II 

Confusion matrix for minimum distance classifier. 



TABLE III 

Performance comparison of different methods. 
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Fig. 6. Scatter plot of features obtained by PCA and LDA. 



LDA, and IsoP approaches, respectively. Comparison between 
the results of our method and the best results of |T6| is 



presented in Table III From Table III it is obvious that the 
proposed method has better accuracy than |T6| even in the 
case that we use 7 classes. 

V. Conclusion 

In this paper we proposed a novel method for head pose 
estimation with less amount of features. In feature extraction 
step, we used contourlet transform. Then, we used PCA 
and LDA approaches for reduction of the dimensionality of 
the feature space. To evaluate the proposed algorithm, 1050 
images from seven different poses of FERET database were 
selected. k-NN and minimum distance classifiers were used to 
classify the different poses. Experimental results demonstrate 
that the proposed algorithm achieve 100% classification rate 
on images selected from FERET database. 
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